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Exact Extreme Value Statistics and the Halo Mass 
Function 
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ABSTRACT 

Motivated by observations that suggest the presence of extremely massive clusters 
at uncomfortably high redshifts for the standard cosmological model to explain, we 
develop a theoretical framework for the study of the most massive haloes, e.g. the 
most massive cluster found in a given volume, based on Extreme Value Statistics 
(EVS). We proceed from the exact distribution of the extreme values drawn from a 
known underlying distribution, rather than relying on asymptotic theory (which is 
independent of the underlying form), arguing that the former is much more likely to 
furnish robust statistical results. We illustrate this argument with a discussion of the 
use of extreme value statistics as a probe of primordial non-Gaussianity. 

Key words: methods: analytical - methods: statistical - dark matter - large-scale 
structure of Universe - galaxies: clusters 



1 INTRODUCTION 

The standard 'concordance' or Lambda cold dark matter 
(ACDM) cosmological model incorporates the idea that 
large scale structure in the universe is assembled hierarch- 
ically from Gaussian-distributed initial perturbations in the 
density of Cold Dark Matter. In the hierarchical models, 
structure in the universe forms in a 'bottom up' fashion, 
with small-scale density perturbations collapsing first before 
merging over time to form larger a nd larger CDM haloes 
l|White fc Frenklll99ll : rPeacockll2000l ') . Baryonic matter falls 
into these hal oes, becoming shocked and virialised to form 
galaxies (e.g. iBensonI 120101 ). The exact details of the rate 
and magnitude of structure formation are highly sensitive 
to the contents and dynamics of the universe and, as such, 
have the potential to constrain deviations from the minimal 
ACDM model. Indeed, the most massive collapsed object in 
the universe can on its own supply a definitive test of cosmo- 
logical models, in that the observation of a single sufficiently 
meissive CDM halo has the ability to rule out at high signi- 
ficance levels models in which such a large object is unlikely 
to form. In particular, the inference that extremely dense 
haloes must have arisen from large upward density fluctu- 
ations seems a promising way to probe possible departures 
from initial Gaussianity. 

In accord with this line of reasoning, there has re- 
cently been considerable interest in the existence of high- 
mass, high-redshift galaxy clusters as a means of identify- 



E-mail: ian.harrison@astro.cf.ac.uk 



ing de viations fr o m AC DM cosmology. Since the discov- 
ery by iJee et al.1 (|2009D of a cluster at z ~ 1.4 with a 
mass of 8.5 ± 1. 7 x IO^^'Mq, and ot h er apparently chal- 
lenging objec ts (iBrodwin et al.l l2010l : ISantos et al.1 1201 ll : 
Foley et al., ,201ll ), several authors have reported tension 
between the existence of s u ch ob je cts and concordanc e cos- 
mology. IJimenez fc Verd3 Hooi), ICavon et all (|201ll ) and 
iHovle et al.l (1201 ih all report that this tension can be eased 
by the presence of primordial non-Gaussianity, paramet- 
erised by /nl, at levels which far ex ceed (by a facto r ~ 10 ) 
the limits imposed by the CMB l|Komatsu et al.1 1201 ll ). 
Whilst models exist t hat predict a running of /nl with scale 
(ILo Verde et al ] |2008l ). it is important to explore the robust- 
ness of these detections before concluding that changes to 
the standard model are needed. Furthermore, future surveys 
will only increase the observed volume in which clusters may 
exist, so the most massive clusters found will increase ac- 
cordingly. 

While the motivation for focussing on such objects 
is strong, in order to perform model selection with 
high mass clusters we need to understand the statist- 
ical properties of such objects. One way of consider- 
ing this problem is t hrough Extreme Value S tatistics 
(EVS) (|Gumbell Il958l : iKatz fc Nadaraiahl |2002| ). which 
seek to make predictions for the greatest (or least) val- 
ued random variable drawn from an underlying distri- 
bution. There has recently been a resurgence of interest 
in applying EVS t o the fi eld of cosmology with paper s 
by iMikelsons et all | |2009|), lYamila Yaryura et al.1 J2OI0I) 
IColombi et al.l l|201ll ). iDavis et al.1 (|201ll ). IWaizmann et all 
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l|201ll ) and lChongchitnan fc Sil'5 l|201ll ). the last three deal- 
ing with high-mass clusters in particular. In this paper we 
look more carefully at the underlying theory, derive from 
first principles the exact extreme value statistics of the halo 
mass function, and investigate their usefulness for constrain- 
ing cosmology. 

The paper is organised as follows. In section [2] we in- 
troduce exact extreme value statistics and show how they 
may be formulated for the case of the halo mass function, in 
both the ACDM case and one including amounts of prim- 
ordial non-Gaussianity. Section [3] compares this theoretical 
prediction for the most-massive cluster with Monte-Carlo 
simulations. In section [4] we conclude and discuss prospects 
for future work in this area. 



2 METHODS 

2.1 Exact and Asymptotic Extreme Value 
Statistics 

If we consider a sequence of A'^ random variates {Mi} drawn 
from a cumulative distribution F{rn) then there will be a 
largest value of the sequence: 



sup{Mi,...Afjv}. 



(1) 



If these variables are mutually independent and identically 
distributed then the probability that all of the deviates are 
less than or equal to some m is given by: 



Fi (Ml < m) . . . Fjv (Miv m) 



"l>(A/max < m; iV) 

= F^(m) (2) 

and the probability distribution for Mmax is then found by 
differentiating ((2]): 

0(M.„ax = m;iV) = NF'{m)[F{m)f-^ 

= Nf{m)[F{m)f~^ (3) 

This gives the exact extreme value distribution for TV obser- 
vations drawn from a known underlying distribution f(m). 
However, it is the seminal result of ex treme value statistics 
l|Frechetlll927l : lFisher fc Tippettj|l928h that, in analogy with 
the central limit theorem for sample means, even in cases 
where /(m) is not explicitly known, in the limit N ^ oo the 
distribution (j){mN) of a suitably rescaled variable 



rriN = 

ON 

(where apf and Bn are functions of A'^ determined by the 
underlying distribution) asymptotically approaches one of 
only three limiting forms: the Type-I, II and III (also known 
as Gumbel, Frechet and WeibuU respectively) extreme value 
distributions. The functions ajv and 6]v may be determined 
via the reciprocal hazard function: 

1 - F{m) 



bN = F ^ ( 1 — ) , ffliv = r(6jv) 



(4) 



(5) 



It is possible to encapsulate all these asymptotic distri- 
butions within the Generalised Extreme Value (GEV) dis- 
tribution: 



where values of the shape parameter 7 = 0, 7 > and 7 < 
pick out Type-I, II and III distributions respectively. We 
have given this distribution the symbol G{m) as opposed to 
<j}{m) to emphasize the difference between exact and asymp- 
totic distri butions. It is possible to determine the asymptotic 
value of 7 (iGnedenkoll 19431 : iGvorgyi et al.ll2010l ). and hence 
the asymptotic distribution type, but this process proves to 
be only analytically tractable for simple distributions. 

The shape parameter describes the form of the asymp- 
totic distribution G(mjv; 7), but exact distributions ^(m; N) 
will still have a best fitting value for 7. Measuring 7 from 
a finite sized sample from a distribution which is in the do- 
main of attraction for the Type-I extreme value distribution 
will lead to a measurement which converges towards zero 
as the sample size increases. For distributions lying in the 
domain of attraction of types II and III, 7 will converge to 
an unknown value, depending on form of the underlying dis- 
tribution. The rate of this convergence can be spectacularly 
slow; for the specific case of a Gaussian distribution (for 
which it can be analytically determined that the asymptote 
is the 7 = distribution) convergence goes as y/lnN only. It 
is therefore necessary to be extremely careful that any ob- 
served value (or change in value) of the shape parameter 7 
is due to changes in the underlying distribution, rather than 
due to the convergence of the exact distribution (f>(m; N) to 
the asymptotic one G{mN',j)- 



2.2 Extreme Value Statistics of the Halo Mass 
Function 

We now seek to determine the statistical distribution of ex- 
treme values for the masses of CDM haloes, and in particular 
the validity of t he asymptotic form ([6|l. fo r realistic cosmo- 
logical volumes. IPress fc Schechten l| 19741 ) were the first to 
provide an analytic method for predicting the co-moving 
number density n{M) of haloes of a given mass M, in differ- 
ential form dn/dM, considering spherical collapse of density 
perturbations in the matter field. Subsequent to this, there 
has been much work developing the halo mass function, both 
analytic and by fitting functions to N- body simulations. 
We choose to use the mass function from ISheth fc TormenI 
(|l99i) including effects from ellipsoidal collapse: 



dn 
dM 



A 



2aSc 



■ exp 



1 



„2 



M dM 



Here, a\i is the variance of the matter field smoothed with 
a top hat window of radius R = (3M/47rp)^'^^, with linear 
power spectrum P(k): 



2 

0"M 



dh 



(8) 



G{rhN\l) = exp{- [1 + 77fi]}_,_^/^. 



(6) 



p is the mean density in the Universe, 5c — 1.686 is 
the critical overdensity for collapse and {A,a,p} are para- 
meters fitted to an N-body simulation and here given 
their original values of {0.322, 0.707, 0.3}. Throughout, we 
use a power spectrum calculated using CAMiJj and the 
WMAP7-I-BAO-I-SN M aximum Likelihood parameters from 
iKomatsu et al.l 120111 ). Using the halo mass function as a 
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(7) 
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predictor of number densities of haloes n{M), we can con- 
struct a probability distribution function (pdf ) for halo mass 
to be used in the calculation of the extreme value distribu- 
tion outlined above: 



F{m) 



1 dn(m) 



ntot 
1 

ntot 



dm 

M 



dM 



dn{M) 



dM 



where the normalisation factor 
dn{M) 



ntot 



dM- 



dM 



(9) 

(10) 

(11) 



is the total (co-moving) number density of haloes. For a con- 
stant redshift box of volume V the total number of expected 
haloes A'^ is then given by ntotV. These distributions can be 
inserted into equation ^ to predict the pdf of the highest 
mass dark matter halo within the volume. 

The form of halo mass distribution in ACDM and al- 
ternative cosmologies can also be examined; as an example 
of deviations from ACDM we include the effects of prim- 
ordial non-Gaussianity. The halo mass function has long 
been known to be sensitive to the presence of primor- 
dial non-Gaussianity ijLucchin fc MatarresdllQSil l and these 
effects have been replicated within N -body simulations 
l|Grossi et al.ll2009l : IPillepich et~alll2010l '). We include non- 
Gaussianity into the model via the non -Gaussian correction 
factor 7^(/NL) of ILo Verde eTZI l|2008l ) (LMSV): 



Ti-LMSvifNh) 



14- 



d In a 



(12) 



where 5*3 is the normalised skewness of the matter density 
field, for which we use the approximation: 



53 ~ 3 X IQ-'-fNLO- 



(13) 



given by equation (2.7) of lEngvist et al.l (|201ll ). The choice 
of the LMSV version is motivated by Figure [H in which we 
plot three methods of including primordial non-Gaussianity 
in the halo mass function; t h e TZ( f Nh) correction factors of 
LMSV and iMatarrese et all (l200oi) fMVJ) and the a nalyt - 
ically applied non-Gaussianitv of lMaggiore fc Riottol (|2010l ) 
(MR), all applied to the /nl ~ MR mass functi on. As 
can be seen ( and as observed by Engvist et all l|201ll) when 
applied to the [Tinker et al.l (|2008h mass function), the MVJ 
correction factor leads to a divergence in the mass function 
in the high- mass limit, which in this analysis we are still 
required to integrate over. By applying non-Gaussianity to 
the MR mass function we can explicitly see that it is the 
7?.(/nl) factor which leads to this divergence, rather than 
the mass function itself. In order to evaluate the efficacy of 
this formulation of the extreme value statistics of the halo 
mass function, we compare the extreme value pdf calculated 
from (|51 llip to Monte Carlo simulations of the most massive 
halo in a universe with a given mass function. In each cos- 
mology, we construct an ensemble of realisations of the halo 
mass function; each realisation is constructed by calculating 
the expected number of haloes in a bin of width A log m 
and drawing from a Poisson distribution with this mean. 
The drawn value is then taken as the number of haloes in 
this bin for this realisation, generating a mock catalogue of 
uncorrelated haloes in the volume V. The largest cluster 
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Figure 1. Halo mass functions with non-Gaussian it y ap 
plied using the pres c riptio ns of iMaegio re &: Ri ottol 
(MR). IMatarrese et all ll2000h (MYJ) and lLo Verde et al.l 
(LMSV) showing the divergence of the MVJ prescription. 



mass for the realisation is determined as the central value of 
the highest occupied bin (which is always singly occupied). 
The distribution of highest-mass cluster in each catalogue is 
then recorded over 10'' realisations. 



3 RESULTS AND COMPARISONS WITH 
OTHER WORK 

Figure [2] shows the res ults of the above procedure for the 
ISheth fc TormenI (|l999l ') mass function with 'WMAP? cos- 
mological parameters. Plotted are Monte Carlo results with 
Poisson errors, the exact extreme value distribution calcu- 
lated using Q and asymptotic Type-I (Gumbel) and GEV 
distributions fitted using a maximum likelihood method. It 
can be seen that the predictions of the exact extreme value 
distribution ((Sjl well match the results of the Monte-Carlo 
simulations. As can be expected, including the extra degree 
of freedom of the shape parameter 7 greatly improves the 
fit of the GEV distribution over the Type-I. 

Figure 13] shows the convergence of the shape parameter 
7 for a variety of spherical volumes and values of the non- 
Gaussianity parameter /nl. Values of 7 are estimated with 
a maximum likelihood method and error bars represent 95% 
confidence intervals. As can be seen, whilst the shape para- 
meter appear well converged for volumes above r > 30 
/i~'Mpc , there is enough statistical noise so as to wash out 
any potential detection of /nl ^ 300 by using 7 as a test 
stat istic, even in this s imple case with uncorrelated haloes. 

iDavis et al.l (|201ll ) also consider the extreme value stat- 
istics of the halo mass function, forming the extreme value 
distribution as the differential of the void probability: 



<1>™'''(M„ 



dPojm) 
dm 



(14) 



where, in the Poisson limit, the void probability is given by: 



Po{m) = exp(— n(> m)V). 



(15) 



Shown in Figure |4] is the comparison between the ex- 
treme value distributions calculated using equations (|14|l 



4 Harrison & Coles 



r = 100 h ' Mpc, f = 




Figure 2. The extreme value distributions for the Sheth-Tormen 
halo mass function. Shown are the exact distribution and two 
best-fitting asymptotic distributions: a Type-I (Gumbel, dash- 
dotted) distribution and a general extreme value distribution with 
free 7 parameter (GEV, dashed). 



r = 20 h ' Mpc, f„ = 
10° . 




Figure 4. Comparison of lDavis et"al] 1 I2OIII) (DDCSP) and this 
work, showing the agreement of both methods of determining the 
extreme value statistics of the halo mass function. The dotted line 
represents the DDCSP version with halo correlations included. 
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Figure 3. The shape parameter 7 for different volumes and val- 
ues of /jvLi estimated using a maximum likelihood method and 
with 95% error bars. Points for /nl = 100 and /nl = 300 are ho- 
rizontally offset by -1-2.5,-1-5 h~^yLpc respectively. Convergence 
appears to be sufficient at volumes > 30 h~^Mpc and 7 appears 
to be poor at discriminating between different values of fpfL 



and ([Sj, showing excellent agreement for the case of un- 
correlated haloes, as is to be expected. The method of 
I Davis et al. 1) can be readily modified to account for 
correlated, biased haloes, primarily because of the simple 
form taken by effects of correlations on the void probability, 
but it remains a future endeavour to include these effects in 
the exact model. However, the agreement of extreme value 
distributions at the high mass end in the cases of both cor- 
related and uncorrelated haloes means that meaningful in- 
ferences on likelihoods of most massive clusters may still be 
drawn from the simple uncorrelated models. 



4 DISCUSSION AND CONCLUSIONS 

We have explored an avenue towards the construction of 
the exact distribution of halo masses which does not entail 
the assumption that the distribution belongs to one of the 



asymptotic types discussed in the classical literature of ex- 
treme value statistics. Using both analytical and numerical 
techniques we have shown that there can be significant dif- 
ferences between the exact and asymptotic distributions and 
show in particular that the shape parameter 7 is unlikely to 
provide an effective statistical discriminator between Gaus- 
sian and non-Gaussian theories of structure formation. 

The approach we have taken relies on accurate know- 
ledge of the behaviour of the underlying distribution for 
large halo masses. Even for the case of Gaussian initial 
conditions (i.e. /nl = 0) there is some theoretical uncer- 
tainty in what this behaviour actually is. There exist a num- 
ber of plausible halo mass functions in the literature (e.g . 



Sheth fc Tormenlll999l : Ijenkins et allbOOll : iReed et al.ll2003l : 



Tinker et al. 20081 ). all of which have differing tail behaviour 
and the level of indeterminacy worsens when we consider 
non-Gaussian models, as discussed in section [2] 

Nevertheless, analytical approaches like those discussed 
in this paper will certainly play an important role in this area 
for some considerable time. The most massive haloes are so 
rare that probing them using numerical techniques will re- 
quire enormous volumes to be simulated with sufficient res- 
olution to obtain accurate halo masses whilst at the same 
time avoiding boundary artifacts. For example, in order to 
determine the probability distribution of the most massive 
cluster in the Hubble volume we would need an ensemble 
of simulations, each so large that it would comprise a large 
number of independent Hubble volumes. Faced with the sig- 
nificant computational cost of such a programme, there can 
be no doubt that analytical theory, calibrated by smaller 
scale simulations, will be the principal theoretical tool by 
which extreme objects will be studied. We will adopt this 
approach in future work. 

The use of extreme value statistics as described in this 
work also has the advantage over studies which seek to use 
rare objects to constrain mass functions of clusters n{M) 
(e.g. IVikhlinin et~alll2009l : lAllen et al.ll201ll ) in that, in the 
EVS approach, a given object can always set a lower limit 
on the global extremum. This avoids the difficulty (in ad- 
dition to the determination of cluster mass) of defining in 
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a unbiased way precisely what volume is being probed, a 
process vulnerable to a posteriori selection effects. 
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